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ABSTRACT 

Evaluation has been received by social scientists and 
administrators alike as the critical step required for the 
identification and implementation of action programs that are both 
effective and within the resources of the country. Very little is 
known about the actual evaluation process, its management 
methodologies and its impact. A national survey was conducted at 
Russell Sage Foundation by the authors of all federally funded 
evaluation studies in the human resource area. Studies were included 
if they were initiated in fiscal 1970 and had a budgetary allotment 
of $10,000 or more. The findings of this survey are described. 
Comparisons are made with respect to both process and impact 
evaluation between investigations in education and those in other 
fields such as health, income security, public safety, welfare, etc. 
In addition to comparing the characteristics of education studies to 
those of others, data on the organizational arrangements under which 
the evaluation is carried out and the characteristics of the 
researchers are presented by field. (Author) 
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Should Evaluation Pesearcliers In Education Mave 
An Inferiority Complex?* 

Ilene Nagel Bernstein Hoivard E. Freeman George K. Bohnistedt 

University of Minnesota Ford Foundation University of Minnesota 

Evaluation research has been defined as the application of the scTeirtiflii- - 
method for assessing the effectiveness of an activity in producing some desired 
social goal.*^ UTiile the definition varies from author to author, the essential 
emphasis on the utilization of sound empirical methodology as opposed to non- 
systematic testimonial -like data remains constant. And yet, despite the 
encouragement for at least a decade by leading scholars in all acadanic dis- 
ciplines, no one as yet has st^od up to praise the efforts of evaluators or 
to acclaim their studies as models of good research. With fe\v exceptions, 
most evaluation research is still described as at best 'lacking' and at worst 
'the major contributor to a Journal of Irreproducible Results.' The explana- 
tions range from a lack of adequate available methodological tecliniques throu]^ 
problems inherent in the nature of the research because of its politicality, 
to the poorly trained persons who do evaluation research. The relevant litera- 
ture abounds with discussions of methodological problems and assertions about 
the conditions under which methodological adequacy is strengthened or 'weakened, 
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aiid yet to date there have been no systematic studies of this activity which 
engages so many professionals. Accordingly, it was our contention that two 
basic questions had to be addressed; the first was to ascertain something 
about the stare of the art of evaluation re:;earch, i.e. where does the support 
come from, in what fom, to whom are awards made, what kinds of organizations 
are awardees affiliated with, what is the academic background of the evalua- 
tion staff, and what are some of the structural conditions under which 

2 

evaluations are carried out. The second was to ask, given that tlie quality 

of'evaluation research varies , ^^'^^i- flyg-Tht?---f^H^4x^ pi-iaht pn5;5;-ihly _ 

account for some of that variance. l\'hile we could not make a definitive 
assessment of quality because of the limitations of our data, we could make 
some assertions based on rigorous analyses of the relative quality of 
research for a population fairly large in size. Additionally, by using the 
data we collected for the descriptive part of our studies, we could test 
certain assertions about the variation in quality as it relates to such 
dimensions as nature of award, sponsoring agenq^ organizational arrangements, 
acadonic discipline of staff, and the like. 

The research reported herein represents a part of a more comprehensive 
study of all evaluation research funded in fiscal 1970, however we focus here 
on academic discipline as a major variable, specifically looking at evaluators 
in education. Educationalists as well as evaluations focusing on education 
are particularly interesting for several reasons. First, perhaps the most 
controversial evaluation in recent years or at least the one with greatest 
publicity, has been the evaluation of an educational program, i.e. Head Start. 
One of the many consequences of the 'Head Start' evaluation was to bring to 
the forefrontmethodological debates about evaluation research, as well as a 
flurry of general discussions about this heavily funded, politically relevant, 
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socially needed, but little biouii about activity in Khich so majiy persons are 

engaged. Second, education has long been thought of as the major means by 

which persons achieve social mobility. Since inequality was considered, at 

least by recent Democratic administrations, to be the single greatest problem 
be in order 

to/overcome/ to acliieve the Great Societ)", education was a naturcil setting for 
refom experiments. Tlius, it is an area which because of its social impor- 
tance has special interest. 

Summarily, the following presentation attempts to describe the state 

i.e. education 

of the art of^eTS tudlion r cc g?.r-^h-for porsong specializing in education, /as alist-Z 

well as assessing how they ' fare relative to persons from other 

academic disciplines in the quality of their research. In order to provide 

a broader basis for comparison, supplementar)'' data are cited as well, 

especially with respect to factors relating to 'quality of research.' 

Since ours was an exploratory study, our l^^potheses would be better 

categorized as 'generally stated assertions', rome of which were theoretically 

lased^ others more experiential or stemming from fundamental beliefs about 

the social reality of the field of evaluation research. In any case, we 

asserted that the evaluation studies would vary considerably in the quality 

of their research and that that variation would be related to the size of the 

budget, the nature of the award, the length of time for the evaluation, the 

federal agency sponsoring the research, the type of organization conducting 

the research, the conditions lander which the research was carried out, 

and so on. As for the direction of these assertions, i.e. which category 

would do better, in almost all cases there were contradictory assertions in 

the literature. For example, some posited that since good research is costly, 

those with larger budgets would fare better. Others on the other hand assert 

that, besides the fact that much of the best social scientific research has 

been done with small budgets, large budgets are often allocated to evaluations 

O of programs with loosely developed ideas in the hopes that somehow such a 

ERIC 
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huge sum of money will not only provide for an adequate evaluation but sunul- 

taneously will straighten out the ill -defined action program as well. On the 

more cynical side, some assert that large budgets are allocated by agencies 

to signify their sincerity, but in fact camouflages what may be in fact an 

attitude of disinterest in any evaluation results, regardless of findings. 

As such, they imply awards are given knowingly to incon^^etent researchers whose 

results can then be discredited because of methodological inadequacies. Uliile 

we could continue to present the conflicting assertions for each of the variables 

named and make a prediction for each, we shall opt insteadTcrBxplore-tbe-^ 

results rather than to posit h>T)otheses as such. Following the advice of 

Micliael Scriven, one of education's leading contributors to the evaluation 

research literature, we will treat our ''evaluation of evaluation researdi" as 

3 

a formative evaluation and proceed accordingly. 

Before commencing with the presentation of our data, a brief description 
or the procedures used for sample selection and methods of oata collection 
seems in order. ^ Our population of interest included all evaluation studies 
funded directly by the agencies of the federal government in the fiscal year 
1970. Accordingly we obtained a list from each agency of every award given 
in FY 70 for an evaluation of a large-scale social action program, aimed at 
ameliorating some social problem, in the areas of health, education, welfare, 
income security, public safety, housing, and manpower, and, with a minimum 
research budget of $10,000 or more. All persons on that list were then sent 
and asked to complete a copy of our questionnaire. Eighty- four per cent of 
those persons returned a completed questionnaire (N - 318). Of those, 74% 
(N = 236) responded that they indeed had done an evaluation stu^ and 261 
(N = 82) responded they had not.^ Thus the results presented below are based 
on data ccmpiled from those 236 respondents. 

At the outset, let us cite some very basic descriptive facts about: 
^ a) evaluations done by persons with degrees in education, b) evaluations 
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focusing on education as a topic aiid c) evaluations sponsored specifically 
by the federal agenq^' most concerned with education, the Office of Education. 
Our data indicate that of all evaluation research funded in fiscal 1970, 11% 
was awarded to persons who indicated that their academic discipline or area 
of specialization was education. Kith respect to evaluations which focused 
on education as the primary concern, we found that 16^^ of all studies were so 
categorized. Lastly, when looking at tlie various agencies and their relative 
contribution, we find that the Office of Education sponsored 10?, of all eval- 
uation research in fiscal 70. The variation in these three figures reflects 
not a discrepanc)^ in the data but rather that: a) not all persons specializ- 
ing in education study problems focusing on education, b) evaluations focusing 
on education were done by persons within a variety of academic .disciplines, 
i.e. 8.11 by economists, 29.7% by psychologists, 2.7?; by sociologists, 16.21 
by 'others', and 43.2% by those specializing in education, and c) the Office 
c £ Education axvarded research funds to persons specializing in disciplines 
other than education as well, especially including psychologists . Tnis brings 
us to an important point to note which is that 191 of the psychologists were 

working on evaluations focused on education. Tliis coupled with the fact 

probably 

that educational psychologists often/ categorized themselves as psychologists 
rather than educationalists leads us to suspect that a truer estimate of 
evaluation research being done by educationalists would be closer to 15% than 
the 11% figure cited previously. Uliile this may seem trivial now, it will 
become important^ as a factor to recall later on in our discussions of academic 
discipline and its relationship to the quality of evaluation research. Unfor- 
tunately in the analyses which follow we will use only those 111 who stated 
their academic discipline was education since we have only inferential and 
not concrete evidence to the contrary. 
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Tuniing our attention first to tlie source o£ funding \<c find that educa- 
tionalists received their awards for evaluation research primarily from Social 
Rehabilitation Services (SRS) (32^^) and from the Office of Ifducation and other 
lIElv' monies, (Not including NIH, NIMH, SRS) (36-0 • Tliis was not an unexj^ected 
result. Perhaps of greater interest is that these awards tended to be larger 
rather than smaller, i.e. 64^^ of their awards were for $100,000 or more as 
compared to 491 for all researcliers combined. 

Since the amount of the financial stipend is closely related to the nature 
of the award, i.e. contracts tend to have larger budgets than grants, we examined 
the relationship between academic discipline and natu/e of award and found that 
educationalists tended to work slightly more on grants (56%) than on contracts 
(44%). Interestingly however, psydiologists do vary substantially with 74% 
of their work being supported by grants as opposed to 26% supported by contract. 

The last variable of interest in this section was tlie amount of time 

allotted for the evaluation. iVhile we know from our data th..t this is related 

to both the size of award, i.e. longer studies carry larger financial stipends, 

and to the nature of the aw^ard, i.e. longer studies are more likely grants 

rather than contracts, we thought it useful to see if there was any systematic 

variation wdth respect to academic discipline, i.e. 'educationalists' tend to 

be represented in roughly equal proportions with respect to the amount of time, 

however for psychologists there was a very strong tendency to work on tlie 

longer studies with respect to temporal duration. This however is closely 

related to the fact that psychologists also tend to work more on grants and 

on studies having large financial stipends. 

Moving on to the second area of interest let us describe something about 

with 

the characteristics of the organizations A which the awardees were affiliated. 
First, with respect to the type of organization which* was conducting the 
evaluation, we find again that educationalists do not vary as much as the 
other disciplines across the different organizational types. UTiile the numbers 



in the cells are small, it is still surprising to note that 28'. of the educa- 
tionalists are located in profit -making corporations. The 24 o in public or 
service agencies are not as luilikely since they probably represent persons 
working in public departments of education. In order for you to see the dis- 
tribution of educationalists versus other disciplines this relationship Is 
shoxm in Table 1.^ 



Table 1 

Type of Organization and Project Director's 
Academic Discipline 

A. »?ercentaging by Type of Organization 



Type of 










Med., S.W. 






Organization 


Educ. 


Psych. 


Econ. 


Sociol. 


and Psycr. 


Other 


Totals 


Profit ' 


11.3(7) 


6.5(4) 


29.0(18) 


14.5(9) 


1.6(1) 


37.1(23) 


100.0(62) 


Non-profit and 
















Research 


9.6(5) 


25.0(13) 


15. ''(8) 


9.6(5) 


13.5(7) 


26.9(14) 


100.0(52) 


Educational 
















Institution 


8.6(7) 


34.6(28) 


9.9(8) 


12.3(10) 


19.8(16) 


14.8(12) 


100.0(81) 


Public/ Service 
















Agency 


15.8(6) 


31.6(12) 


10.^(4) 


10.5(4) 


15.8(6) 


15.8(6) 


100.0(38) 


Totals 


10.7(25) 


24.5(57) 


16.3(38) 


12.0(28) 


12.9(30) 


23.6(55) 


100.0(233 



B. Percentaging by Discipline of Project Director 



Degree of 




Non-profit and 


Educational 


Public /Service 




Project Director 


Profit 


Research 


Institution 


Agency 


Totals 


Education 


28.0(7) 


20.0(5) 


28.0(7) 


24.0(6) 


100.0(25) 


Psychology 


7.0(4) 


22.8(13) 


49.1(28) 


21.1(12) 


100.0(57) 


Economics 


47.4(18) 


21.1(8) 


21.1(8) 


10.5(4) 


100.0(38) 


Sociology 


32.3(9) 


17.9(5) 


35.7(7) 


14.3(4) 


100.0(28) 


Med. , S.W. , & Psycr. 


3.3(1) 


23.3(7) 


53.3(16) 


20.0(6) 


100.0(30) 


Other 


41.8(23) 


25.5(14) 


21.8(16) 


10.9(6) 


100.0(55) 


Totals 


26.6(62) 


22.3(52) 


34.8(81) 


16.3(38) 


100.0(233 



Note — X « 41.4, p < .001. These cases omitted because of 

blank responses. 



In tcnns of an overall picture then, thus far we hav^c noted that educa- 
tionalists receive the greatest proportion of tlieir funding from SRS and the 
Office of Education, the form of award being nearly evenly divided between 
grants and contracts with slightly more of the former, and with budgeting 
allotments larger in size tlian smaller. Further, they tend to be found 
almost eqioally in all types of organizations, and show no propensity to do 
studies longer or shorter in time. 

Turning our attention to the conditions under which educationalists 
work, we will describe three specif i^aspects-ef-thos e cnndi t i ons : — l^the 
formal relationship bet\;oen the evaluation and action components, 2) the 
working relationship regarding research decisions between the evaluation and 
action components and 3) the working relationship regarding research decisions 
between the evalioation staff and funding agency staff. 

These three variables are particularly important because of the possible 
implications they have for the quality of the research. Evaluators have 
spent many hours debating the merits and demerits of conducting evaluations 
in organizations wiiich are simultaneously administering the action program 

being evalioated. Our literature is replete with references to the practitioner 

7 • • 

versus researcher problems and how this threatens objectivity and accessability 

to data. On tlie one hand, we hear that evaluators working within the same 

organization get caught up in the excitement of the program and as such lose 

their desire and/or ability to do a rigorous scientific evaluation which might 

then threaten the continuance of that program. The implication is that what 

emerges is soft testimonial data in place of hard-nosed science. On the other 

hand, we hear that evaluators operating independent of the action program are 

viewed by practitioners as 'heartless critics* who intend to build their 

professional reputations by capitalizing on the program's weaknesses. The 

implication here is that the practitioner 's^ acting in self-defense, thwart 

the researchers by denying them access to the data. Again, the result is a 
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weak methodological design. Wiile tlie relationsliip to the funding agency has 

not been as popular a topic to debate, there too we see contradictory asser- 

it is better to have to have 
tions, some positing/more independence, others/more interdependence. Rather 

than making a contribution to that ever increasing 'debate' literature, we 
turn your attention to our data, first to describe the conditions under which 
evaluations are actually carried on, and later, in the final section of the 
paper to ascertain its relationship to quality of research. 

Within the total sample population, 38^u conducted their evaluation 
research \diile working \dthin the same organization that was conducting the 
action program. Importantly however, fewer than 10^ of these 38^ were done 
wherein exactly the same persons administered the action program as those 
playing a major role in the evaluation researdi. For example, an experimental 
program using teaching machines to increase reading was administered by the 
local community sdiool system. That same school system employed two of its 
assistant principals who had r'=^search training as part of their background 
to conduct the evaluation study. IVhile the action staff and evaluation staff 
are composed of different persons, they are common members of a larger umbrella 
organization. Our data indicated that when the action and evaluation staffs 
were part of the same organization, they tended to be different rather than 
the same persons. Of interest here is that when examining this dichotomous 
variable, which we tenned the 'organizational arrangements', we found that of 
the educationalists, 48% conducted their evaluations within the same organiza- 
tion. Here again, because this is a key variable, we will show this relation- 
ship across disciplines. Again we request that you especially note the dis- 
tribution of psychologists. 



4 
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Table 2 

Organizational Arrangements and Project Director's 
Academic Discipline 

A. Percentaging by Organizational Arrangements 



Evaluation and 
Action by: 



Educ. 



Psych. 



Econ. 



Med., S.W 
Sociol. and Psycr. 



1. Same Organizations 13.5(12) 34.8(31) 9.0(8) 6.7(6) 20.2(18) 

2. Different 

erganizations a.0(13) 18.1(26) 20.8(30) 15.3(12) 8.3(12) 



Other Totals 
15.7(U) 100.0(89) 



28.5(41) 100.0(14^ 



Totals 



Degree of 
Project Director 

Education 
Psychology 
Economics 
Sociology 
Med., S.W., & 

Psycr. 
Other 



Totals 



10.7(25) 24.5(57) 16.3(38) 12.0(28) 12.9(30) 



B. Percentaging by Discipline of Project Director 



Evaluation and Action Program by 



23.6(55) 100.0(231- 



Same 
Organizations 
48.0(12) 
54.4(31) 
21.1(8) 
21.4(6) 

60.0(18) 
25.5(14) 



Different 
Organizations 
51.0(13) 
45.6(26) 
78.9(30) 
78.6(22) 

40.0(12) 
74.5(41) 



Totals 

100.0(25) 

100.0(57) 

100.0(38) 

100.0(23) 

100.0(44) 
100.0(55) 



38.2(89) 



61.8(144) 



100.0(233) 



Note 



— v2 = 



X e. ~ 25.2, p < .001. Three cases omitted because of blank responses. 



The second aspect o£ the structural conditions which we were interested 
in was the working relationship bet\\?een the evaluation staff and action staff 
vis a vis the major research decisions regarding the evaluation. This was 
especially important insofar as evaluation research literature is replete with 
references to the implications this variable has for methodological quality. 
1^ In looking then at 'educationalists' we find that 68% work in relationships 
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which we categorized as 'joint plcuming', i.e. where research decisions are 
made by the two groups in close conjunction with one another. Again, the 
variable is so ijnportant that the table is shouTi below. 



•Le J 



Working Relationship between the Action and Evaluation 
Staffs and the Project Director's Academic Discipline 

A. Percentaging by Working Arrangements of Action 
and Evaluation Staffs 



Med., S.W. 



Working Arrangement 


Educ. Psych. 


Boon. Sociol. 


and Psycr. Other Totals 


Joint Planning 
Action agency 

reviews 
Eval. agency 

indep. 


17.2(17) 29.3(29) 
2.2(1) 20.0(9) 
7.9(7) 21.3(19) 


7.1(7) 9.1(9) 
20.0(9) 11.1(5) 
24.7(23) 15.7(14) 


21.2(21) 16.2(16) 100.0(99) 
8.9(4) 37.8(17) 100.0(45) 
5.6(5) 24.7(22) 100.0(89) 


Totals 


10.7(25) 24.5(57) 


16.3(38) 12.0(28) 


12.9(30) 23.6(55) 100,or''.33 




B. Percentaging by Discipline of Project Director 


Degree of 
Project Director 


Joint 
Planning 


Action agency Evaluation 

Reviews research Agency indep. Totals 


Education 
Psychology 
Economics 
Sociology 
Med., S.W. & 

Psycr. 
Other 


68.0(17) 
50.9(29) 
18.4(7) 
32.1(9) 

70.0(21) 
29.1(16) 


4.0(1) 
15.8(9) 
23.7(9) 
17.9(5) 

13. 3^) 
30.9(17) 


28.0(7) 100.0(25) 
33.3(19) 100.0(57) 
57.9(22) 100.0(38) 
50.0(14) 100.0(28) 

16.7(5) 100.0(30) 
40.0(22) 100.0(55) 


Totals 


42.5(99) 


19.3(45) 


38.2(89) 100.0(233) 



Note — X 10 36.1, p < .001. Three cases omitted because of blank ' 

responses. 
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Lastly, with respect to the third aspect, i*e. tlie ivorking relationship 
between the evaluation staff and funding agenq^ staff vis a vis research 
decisions, our data indicate that 'educationalists' tend to be more indepen- 
dent (40'O of the funding agenq^. This table is not presented because the 
overall relationship bet^ceen academic discipline of tlie project director and 
the working relationship between evaluation and funding was not statistically 
significant despite the fact that for persons in education there was a dis- 
cemable pattern. 

Before concluding our descriptions of the independent variables examined, 
one last important dimension was reviewed. In so far as we believed that 
social action programs should somehow have an overall perspective, theory or 
frame of reference which guides their program and further, that the presence 
of a theoretical framework would facilitate an evaluation of that program by 
helping to define the boundaries or goals, we examined the distribution in 
responses to this question for ^he different academic disciplines. First, 
we should note that of all evaluations done in FY 70 only 18.3% gave evidence 
of having a formal social structural or social psycliological cheory guiding 
their action program. ,661 on the other hand stated that their overall 
perspective was what we called a social service model which in effect trans- 
lated the notion that 'giving people services was beneficial.' The remaining 
161 had no theoretical framework or what was termed an 'a- theoretical naive 
hypothesis' e.g. senior citizens make good parole officers (with no informa- 
tion to support their assertions). Looking at 'educationalists' and 'theory' 
then, we find that 16% had a foimal structural or social psychological theory, 
S2% a social service model and 32% no theory at all. In fact, of the total 
sample, educationalists had the highest proportion in the 'no theory' category 
Looking at psychologists on the other hand, we find they are very overrepresented 
in the 'formal theory' group and underrepresented in the 'no theory' group. 
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In conclusion, to sunmarize briefly this last section, our data suggest 
a pattern for 'educationalists' v/herein they are very nearly equally divided 
between conducting evaluations in the sajne organization as opposed to different 
organizations with respect to the action component, they lean heavily towards 
working interdependent ly with the action staff but independent of the funding 
agenc)^ staff > The pattern is less strong but nonetheless similar for 
psychologists. With respect to theory 'educationalists' tend to work on 
action programs without a guiding theory while psychologists are just the 
opposite. 

While the descriptive aspect is most infomative, the more interesting 
and pressing question for us was an analytic one which asked about the quality 
of the research and how various factors helped to explain some of the variance 
in that quality. For purposes of our researcli we defined quality as adherence 
to the rules of the scientific method, or more specifically the satisfaction 

o 

of \A\at are considered minimum requirements for good evaluation research. 
Operationally, we defined quality of research by creating an index composed 
of several variables which had clustered together when using a factor analytic 
technique. The alpha for Index A was .42, Index B .53 and Index C 169. Clearly 
these are not high, but recall that they are three item indices. 
Since our data were separated into two distinct areas, i.e. a) the measure- 
ment of whether the program was being carried out in accordance with stated 
specifications and guidelines (input or process measure - Index A) and b) the 
measurement of iiT5)act or change whidi occurred as a result of the action 

program (output or effect measure - Index B) we created two separate indices. 

g 

Additionally because it is asserted that the best evaluative research con- 
tains both a measurement of input (process) and a measure of output (effect) 
we created a third index which was a composite of those variables comprising 
Indices A and B. This third index is labelled Index C and refers then to 
what we call 'comprehensive evaluations'. Lastly, it is important to note that 
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the variables used in these indices were ordered from high quality to low 
according to the tenets o£ science and then treated as i£ they were interval 
data for purposes of our analyses. A complete explication of our method 
of index construction as well as the component variables is given in Appendix 
A. 

To compute our results we used a regression analysis fomiat specifically 
using a particular variation referred to as 'dummy variable regression. 
We computed both gross effects and net effects, the former being the total 

ajnount of variance in the dependent variable that the independent variable 

'7 12 

could explain (eta squared or R ) and the latter being the amount of variance 
in the dependent variable explained by the independent variable net of ( con- 
trolling for) all other variables . In computing gross effects we defined 
each of our indices of qmlity, Index A, B, and C as our dependent variables, 

and then regressed each index on each of the independent variables. A test 

2 

of significance was done for each R . Additionally, we computed conditional 

means, i.e. the mean of the dependent variable for each specific response 

category of every independent variable. In COTtputing net effects we entered 

all the independent variables which we had hypothesized would be important 

2 

into one regression equation and got out an R . We then systematically 

deleted each set of dummy variables for each independent variable, one set 

at a time, to get the unique variance or net effect of each independent 

variable. Again, a test of significance was done to determine the statistical 

significance of each of these net effects. Presented below in Table 4 are the 

2 

conditional means, gross effects (R ) and net effects for project director's 
vAiose academic discipline was education, as well as for some of the other 
variables discussed earlier in the p^er. 
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You Kill note that net effects have not been computed for budget (funds 
allocated). Tliis was due to the fact that quite surprisingly the size of 
the award did not seem to be able to account for any o£ the variance in the 
quality of the research (notice tlie for funds allocated). Additionally, 
before constructing our indices we computed measures of association for 
eacli of the independent variables e.g. budget, with each of the dependent 
variables whidi were later included in each index and there too budget 
did not seem to vary significantly or systematically with any of the dependent 
variables. Thus it was not included in the regression equations used to 
compute net effects . 

The indices of quality were constructed such that persons could score from 
zero to three on Indices A and B and from zero to six on Index C, with higlier 
scores indicating better research quality. Note that on the measurement 
of input (process) the average quality score was 2.19. IMle it may appear 
tnen that most studies were of fairly high quality it must oe emphasized that 
the index was composed of items which reflected the satisfaction of minimum 
scientific criteria. For example if in a study assessing the implementation 
and effectiveness of a remedial reading program, one had taken a simple randan 
sample of both teachers and the target student population, used multivariate 
statistical tediniques to analyze one's data, and thus categorized one's self 
as having done a quantitative stucfy, a perfect score of 3 would have been 
attained on Index A. Thus a score of 2.19 on Index A indicates that the 
quality of evaliaation studies in general was not particularly high. In fact 
only 25'P of all evaluations which assessed input (N - 185) scored a 3 while 
91 scored a 1 or less. Similarly, for Index B which contained items rating 
research design, sampling frame and adeqmcy of measurement, a perfect score 
of 3 was obtained by only 19% of those who assessed impact (N - 182) even 
though the mean w^as 1.80. Additionally, 27% attained a score of 1 or less 
than 1. The mean for Index C, which was a conposite of A and B, and conputed 



-19- 

only for those 152 studies whicli measured process and impact, was 4.08 with 
11% attaining a 6 and 11% receiving a score o£ 3 or less, llie median value 
for Index A is equal to 2.25, for Index B> 2.0, and for Index C, 4.21, which 
while only slightly higher than the mean scores lends further credence to 
the assertion that many of the evaluation studies were using less than 
adequate methodology. 

Now, let us examine more closely how well 'educationalists' fare when 
compared to project directors from other academic disciplines. The eighth 
entry in Table 4 indicates that psychologists do far better evaluation 
research than persons from any of the other academic disciplines. On all 
three of the indices of qualit)'- psychologists score roughly one-half standard 
deviation above the mean. V^hile educationalists turn out to have the second 
highest scores on two of the three indices, it must be noted that in each of 
these cases, their score., are slightly below the overall mean, indicating 
how poorly, relative to psychclog>% the other disciplines in':luding educa- 
tionalists are doing. Moreover, to strengthen our findings here, we examined 
the relationship of the principal investigator's academic discipline to quality 
of research, as well as that of the person most essential on a daily basis to 
the evaluation effort and, in both instances found the pattern to be the 
same. IVhile the order of disciplines other than psychology varied, psych- 
ologists remained constant as the single group doing the • best evaluation 
research. And so now we may ask, "Should Evaluation Researchers in Education 
Have An Inferiority Complex?" The answer, like most ans^vers to scientific 
questions is imclear. If one^s reference group is psychologists, it seems 
clear that those educationalists doing evaluation research are doing less well. 
On the other hand, another way of interpreting it is to assert that education- 
alists doing evaliiation research aren't doing it any worse than those in other 
disciplines, with the exception of psychologists. We suggest that nav is the 
time to recall that some psychologists are educational psychologists and as 
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such apparently are doing good researcli. Unfortunately we don't know the number 
whicli may be involved here. 

One can only speculate as to why psycliologists do better research than 
persons in other disciplines. It is common knowledge that training in psych- 
ology for the Ph.D. includes courses in experimental design, statistics ^d 
measurement. Additionally, experimentation has a long histor>' in psychology, 
utiereas experimentation is virtually ujiknown in the field of economics and, 
sociologists (with the exception of social psycliologists) are relatively 
unfamiliar with it as well. UTiile there has been much emphasis placed on the 
need for experimentation in the evaluation of educational programs, e.g. 
Campbell, D. T. and Stanley, J. C. (1963), Trow, M. (1971), etc., our 
impression is that actual research training in experimentation is still more 
likely to occur in psycholog^y than in education. 

A cautionary note should be interjected here. As the data in Table 4 
indicate, only 8^ of the variance in Index A, 12% in Index B, and 14% in 
Index C is explained by the project director's academic dLxipline. Even 
more important is to note that most of the effects of this variable can be 
removed \'Aien all the other independent variables are controlled. However we 
would like to neither understress nor overstress this last statement since 
we are still in the process of trying to detemine causal effects. This is 
particularly difficult since our own research is obviously not experimental 
and there is a substantial amount of collinearity among the independent variables. 
The point is that among our independent variables academic discipline has some 
of the largest gross effects on quality of research, and yet it explains only 
a relatively small amount of variation in research quality, and, even these 
effects can be substantially reduced when other independent variables are 
controlled. 

l\'hile time does not peimxt us here to elaborate the theoretical postula- 
tions which might serve to explain why most of the other variables shown in 
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Table 4 account also for some o£ the variance in researdi quality, we can 
give a very brief sunnnar)" of what our tentative conclusions are. With respect 
to what may be termed the 'administrative domain', we note that studies of the 
highest quality tend to be sponsored by NIH/NIMII, au^arded in the fom of 
grants, to evaluate action programs focused on the areas of health or mental 
health. Regarding what may be temed an 'organizational domain', these same 
awards tend to be given to persons affiliated with educational institutions 
who conduct their evaluations within the same organization as the one admin- 
istering the action program, and who make major research decisions independent 
of the funding staff and interdependently with the action staff. Lastly, the 
highest quality seems to be foitnd in studies long in duration, where the evalua 
tions are of action programs guided by some formal theory, and, where the 
project directoA academic discipline is psychology. All of this must be 
underscored by the fact that there tends to be moderately to high associations 
between all of these categories of independent variables with one another. 

In summation, the genera', quality of research in evaluation is not 
higji, and the evaluation research done by 'educationalists' identifiable in 
our sample appears to be not high as well. 
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Methods and Research , 1972, vol. 1, pp. 13-37. 

11. For a discussion of this technirue see Cohen, Jacob, "Multiple Regression as 
a General Data-Analytic System*', Psychological Bulletin , 1968, vol. 70 , 

6, pp. 426-443. 

12. Eta squared is equal to the 'between sum of squares' over the 'total sum of 
squares' as shown in Blalock, Hubert, Social Statistics , 1972, 2nd edition, 
McGraw Hill, N.Y., pp. 413. Eta squared is also equal to the multiple 
correlation squared. (R^) 
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We constructed our indices in the following ways: 

Index A - Quality of Measurement of Impact 

Using the variables: Sampling where non-systematic non-random, 
non-systematic random, and random or non-random cluster samples 
received a 0, and stratified random, simple random or all 
observed received a 1, Type of Data Analysis where no statistics, 
ratings, or impressions received a 0, narratives or impression- 
istic summaries received a 1, ratings from qualitative data 
received a 2, simple descriptive statistics received a 3, and 
multivariate statistics received a 4, and Nature of Data 
Analysis where qualitative analyses received a 0, an analysis 
evenly divided between qioalitative and quantitative received 
a 1 and quantitative analyses received a 2,^^ 



we weighted each item by _1_ where m was the number of response categories , 

m-1 

For exan^le, an evalijiation wherein the measurement of process involved the 
use of a siinple randan sample (^tjOCI)* and multivariate statistical proce- 
dures (-g-Y)^^^ and whose analysis was predominantly quantitative i?^jr^{l) 
would receive a total score of 3 on Index A. That is the total score was 
eqioal to the sum of the products of the response code times the weight for 
each item. 



IvTiile there may be some debate as to the order we have imposed here, i.e, 
quantitative as higher than half quantitative and half qualitative, we feel 
justified in so doing since most of the current literature on evaluation 
research methods, e.g. Suchman, E.A. , Evaluative Research, 1967, Russell 
Sage, N.Y., Caro F. (ed.). Readings in Evaluation Research , 1971 Russell 
Sage, N.Y., Rossi, P. and Williams W. , Evaluating Social Programs , 1972, 
Seminar Press, N.Y., and Sheldon, E.B. and Bernstein, I.N., '^lethods of 
Evaluative Research", in Social Science Methods , (ed.) Robert Smith, 1973, 
O , Free Press, N.Y., strongly suggests that tlie best evaluations in terms of 

ERJC research quality are those which are highly quantitative. 
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Index B - Quality of Measurement of Impact or Output; 

Using the variables: Nature of Research Design where 
descriptive studies received a 0, canparative, longitud- 
inal or cross -sectional studies without randomization or 
control received a 1, experimental designs without both 
randomization and control received a 2 and, experimental 
desijps with randomization and control groips received a 
3 and Representativeness of the Sample where haphazardly 
drawn samples received a 0, moderately representative 

samples received a 1, and representative samples received 
17 

a 2, and Quality of the Measurement Procedures ivhere 
those responses judged as reflecting adequate measurement 
received a 1 and less adequate received a 0, an index was 
constructed. The judgment as to the adequacy of measure- 
ment was based on the principles of good measurement /^as 
stated in Attitude Measur ^ent edited by Gene Summers (1969) _7 
and on the apparent good fit bet\>reen the response given which 
cited the criteria used on which inpact on change were studied 
and how that was measured. Primarily good fit was assessed on 
the basis of content validity. 



17 

This was one of the questions which required a judgmental procedure. The 
process followed was similar to the coding used on another question concerning 
the adequacy of sampling. That is, those evaluations which drew simple random 
or stratified random samples from the populations th^ wished to generalize 
their findings to were coded representative or 2. Those which were systematic 
non-random or random or non-random cluster were coded as moderately representative 
and those which were non-systematic non-random were coded haphazard. Uhile the 
first two are clearly defined in Blalock, Hubert M., Social Statistics , N.Y., 1960, 
McGraw-Hill, pp. 392-410, an example which will help clarity a case of the latter 
is the following: an evaluation of the effect of a referral program for first 
offenders wa:> based on a sample of the experience of the first thirty cases 
referred to the agency. 

^^he satisfaction of a measure having adequate content validity used the definition 
of content validity as it appears in Kerlinger, Fred, Foundations of Behavioral 
Research , 1964, N.Y., Holt, Rinehart and Winston, pp. 444-447. An example of a 
response which was coded adequate was: **the criteria by which the effectiveness of 
an educational program aimed at increasing cognitive ability of mentally retarded 
children was the use of standardized reading conprehension, vocabulary, and 
arithmetic tests, all of which had been pretested for reliability on other similar 
target populations. Five repeated measures were taken over a 2 year period. 
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1 



Again, as in Index A, \ve weighted each item by where m was the number o£ 
response categories for that item. Tlius as in Index A, die total scores for 
a respondent on Index B ran from zero to three • 



Index C - Comprehensive Evaluations: Quality of 
Measurement of Process and Impact 

Using the variables: Sampling , Type of Data Analysis , Nature of 
Data Analysis , Nature of Research Design , Representativeness of 
the Sample and Quality of Measurement Procedures , ordered as they 
were in Indices A and B, we constructed the index by weighting 
each item by ~j where m was the number of response categories 
for that item. Thus the total scores for a respondent in Index 
C ranged from zero to six. 
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